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Background: Patient satisfaction is crucial for the acceptance, use, and adherence to 
reconmiendations from teleconsultations regarding health care requests and triage services. 
Objectives: Our objectives are to systematically review the literature for multidimensional 
instruments that measure patient satisfaction after teleconsultation and triage and to compare 
these for content, reliability, validity, and factor analysis. 

Metliods: We searched Medline, the Cumulative Index to Nursing and Allied Health Literature, 
and PsycINFO for literature on these instruments. Two reviewers independently screened all 
obtained references for eligibility and extracted data from the eligible articles. The results were 
presented using summary tables. 

Results: We included 31 publications, describing 16 instruments in our review. The reporting 
on test development and psychometric characteristics was incomplete. The development pro- 
cess, described by ten of 16 instruments, included a review of the literature (n=7), patient or 
stakeholder interviews (n=5), and expert consultations (n=3). Four instruments evaluated factor 
structure, reliability, and validity; two of those four demonstrated low levels of reliability for 
some of their subscales. 

Conclusion: A majority of instruments on patient satisfaction after teleconsultation showed 
methodological limitations and lack rigorous evaluation. Users should carefully reflect on the 
content of the questionnaires and their relevance to the application. Future research should apply 
more rigorously established scientific standards for instrument development and psychometric 
evaluation. 

Keywords: teleconsultation, teletriage, triage, consultation, general practitioner, patient 
satisfaction, psychometric, evaluation, out-of-hours 

Introduction 

In recent years, telephone consultation and triage have gained popularity as a means 
for health care dehvery.'-^ Teleconsultations and triage refer to "the process where 
calls from people with a health care problem are received, assessed, and managed by 
giving advice or via a referral to a more appropriate service."^ The main motive for 
introducing such services was to help callers to self-manage their health problems 
and to reduce unnecessary demands on other health care services. Teleconsultation 
and triage are frequently used in the context of out-of-hours primary care services.* 
They result in the counseling of patients about the appropriate level of care (general 
practitioner, specialized physician, other health care providers, [such as therapists], or 
hospital care), the appropriate time-to-treat (ranging from emergency care to seeking 
an appointment within a few weeks), or the potential for self-care. Several randomized 
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controlled trials showed that teletriage is safe and effective,^"' 
and a systematic review suggested that at least one-half of the 
calls can be handled by telephone advice alone. ^ 

The patients' opinions on the quality of such services 
are crucial for their acceptance, use, and adherence to the 
recommendations resulting from the teleconsultation.' '" 
Instruments to measure patient satisfaction have been devel- 
oped for a broad range of settings. However, these instru- 
ments cannot easily be transferred into the teleconsultation 
setting, which systematically differs in two respects: 1) deci- 
sions in teleconsultation and triage rely heavily on medical 
history-taking as the main - and sometimes only - diagnostic 
tool, so excellent communication and history-taking skills are 
crucial in this setting; 2) teleconsultation and triage services 
generally relate to the appearance of new health problems 
and less frequently address long-term management for which 
patients usually attend face-to-face care.' 

Patient satisfaction is a multidimensional construct." '^ 
Global indices (single-item instruments) have been shown 
to be unreliable for the measurement of patient satisfaction 
in health care and to disguise the fact that judgments on 
different aspects of care may vary.'"'^ Instruments assessing 
patient satisfaction after teleconsultation and triage need to 
cover the perceived quality of the communication skills, 
of the telephone advice (eg, helpfulness and feasibility of 
the recommendation), and of the organizational issues of 
the service, such as access or waiting time.'" In a previous 
review, methodological issues related to the measurement 
of patient satisfaction with health care were systemati- 
cally collected.'" Several problems were addressed, such as 
how different ways of conducting surveys affect response 
rates and consumers' evaluations. However, the review did 
not include detailed information on patient satisfaction 
questionnaires, nor did it give specific recommendations 
related to questionnaire use. A more recent systematic 
review in 2006 on patient satisfaction with primary care 
out-of-hours services presented four questionnaires, all with 
important limitations in their development and evaluation 
process." 

However, out-of-hours care is only a small part of tele- 
consultation and triage services. Furthermore, none of the 
previous reviews explicitly followed up on research that 
modified and reevaluated existing instruments. Therefore, the 
aim of our study was to systematically review the scientific 
literature for multidimensional instruments that measure 
patient satisfaction after teleconsultation and triage for a 
health problem and to compare their development process, 
content, and psychometric properties. 



Methods 

Literature search 

We searched Medline, the Cumulative Index to Nursing and 
Alhed Health Literature, and PsycINFO (query date of January 
3 1 , 20 1 3) for relevant literature. The search terms were related 
to "patient satisfaction", "questionnaire", and "triage" (Table 
SI). We reviewed the reference lists of all publications included 
in the final review for relevant articles. Furthermore, we 
searched the Internet for additional material, in particular for 
follow-up research, the refinement of the included instruments 
via authors' names, and the names of the instruments. 

Study selection and data collection 
process 

The pool of potentially relevant articles identified via data- 
bases, reference lists, and Internet searches was evaluated 
in detail regarding whether or not the articles were original 
research articles, whether or not they described instruments 
for assessing patient satisfaction after an encounter between a 
health professional and a patient or his proxy over the phone, 
and whether or not they were intended for self-administered 
or interviewer-administered use (Table l).'"* As we were 
interested in multidimensional instruments, we excluded 
global indices (single-item measures). We included telephone 
and video consultations, as well as out-of-hours services 
that performed triage by phone. Out-of-hours services were 
defined as any request for medical care on public holidays, 
Sundays, and at a defined time on weekdays and Saturdays 
(for example, weekdays from 7 pm to 7 am and Saturdays 
from 1 pm onward). We included studies that reported the 
development of the instrument (called "development stud- 
ies") and studies that applied the instrument for outcome 
assessment (called "outcome studies"). We did not apply any 
language restriction. Two reviewers (MAI, EB) independently 
screened the references for eligibility, extracted the data, and 
allocated the instrument items to the predefined domains. 
Discrepancies were solved by consensus. 

Data extraction and analysis 

We extracted the following information: 

1 . Descriptive information: author; year of pubhcation; country 
of origin; setting; staff providing the service; type of admin- 
istration of the questionnaire; participants; and timing of 
administering the instrument after the encounter (Table 1). 

2. Instrument content: number of items per domain; number 
of domains covered per study; total number of items; 
mean items per domain; number of studies that covered 
a certain domain with at least one question (Table 2). 
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Table 2 Instrument content (related to teleconsultation) 


Author 


Year 


Access 


Attitude 


Attitude 


Communication 


Individual 






to service 


of health 


of patient 




information' 








professional 








Campbell et al'° 


2007 


1 


3 


1 


5 


15 


Dehours et al™ 


2012 


0 


0 


1 


3 


5 


Dixon and Williams" 


1988 


0 


0 


0 


1 


0 


Dixon and SthaP^ 


2009 


0 


1 


0 


2 


0 


Garratt et al' ' 


2010 


0 


4 


0 


1 


0 


Hicks et 


2003 


1 


1 


2 


0 


0 


Keatinge and Rawlings^' 


2005 


0 


0 


4 


1 


3 


McKinley et aP° 


1997 


2 


1 


1 


6 


0 


McKinstry et al' 


2002 


0 


0 


1 


3 


0 


Mekhjian et aP'' 


1999 


0 


0 


5 


4 


0 


Moll van Charante et al" 


2006 


1 


2 


0 


5 


0 


Moscato et al" *" 


2003 


0 


1 


3 


2 


2 


Rahmqvist et al" 


2009 


0 


0 


0 


3 


0 


Salisbury et aP' 


2005 


1 


1 


0 


0 


0 


Strom et aP' 


201 1 


2 


2 


1 


3 


0 


Van Uden et aP° 


2005 


2 


1 


2 


2 


0 


# of studies that covered a certain 




7 


10 


10 


14 


4 


domain with at least one question 














Notes: ^Sociodemograpiiics; result of teleconsultation; 


■"revised version as 


published by Beaulieu and Humphreys.''^ 







3. Details of the development process: such as literature of patients; response rate; and nonresponse analysis 
review, consultation with experts, consensus, focus group (Table 4). 

meetings, or individual interviews; piloting; and rating 5. Psychometric properties: item nonresponse; factor struc- 

scale (Table 3). ture; reliability (ie, interrater, test/retest, intermethod, 

4. Recruitment strategy and handling of nonresponders: and internal consistency reliability); and validity (ie, 
inclusion and exclusion criteria; consecutive recruitment construct, content, criterion validity) (Table 5). 



Table 3 Descriptive 


information of the instruments 






Author 


Year 


Development process 


Piloting 


Rating mode 


Campbell et al" 


2007 


Literature review, consultation with 


Yes 


5-point Likert scale 






experts (no further specification) 






Dehours et aP° 


2012 


Consensus of the working group 


Yes 


Yes/no, categorical, open-ended 


Dixon and Williams" 


1988 


NR 


Yes 


Yes/no 


Dixon and SthaP^ 


2009 


NR 


Yes 


Numerical rating scale 1-5 


Garratt et aP' 


2010 


Literature review, consultation with experts. 


Yes 


Unclear 






interview with patients 






Hicks et aP' 


2003 


NR 


NR 


7-point Likert scale 


Keatinge and Rawlings^' 


2005 


NR 


Yes 


Categorical 


McKinley et aP" 


1997 


Literature review, focus group meetings with 


Yes 


5-point Likert scale 






patients recruited from general practice registers 










and community groups led by a nonclinician 






McKinstry et al' 


2002 


NR 


NR 


Numerical rating scale 0-3 


Mekhjian et aP"" 


1999 


Literature review 


NR 


5-point Likert scale 


Moll van Charante et al" 2006 


Literature review, interview of stakeholders 


Yes 


Numerical rating scale l-IO 


Moscato et aP^ 


2003 


Qualitative interviews with adults who had 


Yes 


5-point Likert scale and check- 






received phone advice 




off options 


Rahmqvist et al" 


2009 


NR 


NR 


7-point Likert scale 


Salisbury et aP' 


2005 


Literature review, use of McKinley questionnaire 


Yes 


5-point smiley faces (very 






as a basis, development of draft short version 




dissatisfied to very satisfied) 


Strom et aP' 


201 1 


Multidisciplinary expert group decision. 


Yes 


Visual analog scale 0—10 






interview with patients 






Van Uden et aP° 


2005 


Literature review, interview of general 


NR 


5-point Likert scale 






practitioner's managers 







Abbreviation: NK, not reported. 
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Management 



Overall 
satisfaction 



Professional 
skills 



Telephone 
advice 



Other 



Number of Total 
domains covered number 
per study of items 



3 
I 

0 
I 

0 
3 
I 

3 
0 
I 

0 
3 

I 
I 
I 

4 

12 



3 
I 

I 

0 
0 
0 
2 

7 
I 

0 
3 
I 

I 

2 
2 
6 
12 



Diagnostics (8), 
training of staff (3) 



Technical aspects (I) 
Alternative to 
teleconsultation (I) 



Technical aspects (3) 
Access to pharmacy ( I ) 
Alternative to 
teleconsultation (I) 



9 
6 

3 
4 
4 
4 
6 

6 
3 
4 
6 
7 

5 
5 
8 
7 

5.4 



37 
24 

3 

5 

10 

8 

13 

20 

5 

14 

14 

15 

7 
7 
14 

22 

1 3.6 mean 



The data was tabulated and summarized in a descriptive 
way. 

First, we listed all primary studies and extracted basic 
information. Outcome studies - that evaluated the same 
instrument in various settings and populations - were grouped 
under the corresponding development study. When several 
studies referred to the same instrument, we used the develop- 
ment study to extract data for the following steps. 

Second we analyzed the content domains of the instruments. 
Based on a systematic review, published by Garratt et al, we 
created a hst of nine domains (access to the service, attitude of 
health professional, attitude of patient, perceived quality of the 
communication, individual information [such as sociodemo- 
graphic or clinical patient data], management [such as waiting 
time], overall satisfaction, perceived quality of professional 
skills of the staff, perceived quality of the telephone advice 
[such as helpfulness and feasibility of the recommendation]), 
and other.'' Two reviewers independently attributed each item 
of the instruments to one domain. The aim of this procedure 
was to describe, to characterize, and to compare the content of 
patient satisfaction instruments for which no factor-analysis 
results were reported. We did not use these dimensions as a 
prerequisite for instruments to be included in our review. 

Third, we explored the development process of the instru- 
ment, the scoring scheme of the instrument, and the performance 
of a piloting. When we identified only one study to an instru- 
ment, we extracted the data from this publication, regardless of 
whether it was a development study or an outcome study. 



Fourth, we assessed the recruitment strategy and handling 
of nonresponders in those publications that reported statistical 
results for psychometric properties. This type of information 
is useful for interpretation of the statistical results so that - for 
those studies not reporting on factor structure, reliability, or 
validity - we did not detail recruitment strategy and handling 
of nonresponders. 

Fifth, we tabulated any type of psychometric property that 
we identified in any type of publication. For the interpretation 
of Cronbach's alpha values, an estimate of the reliability of 
an instrument, we used the categories: excellent (>0.9); good 
(0.8-0.9); acceptable (0.7-0.8); questionable (0.6-0.7); poor 
(0.5-0.6); and unacceptable (<0.5)." An item- total correlation 
of <0.3 was considered poor, indicating that the corresponding 
item does not correlate well with the overall scale."" 

Results 

Our search identified 3,65 1 references. We screened 224 fiill- 
text publications for eligibility and, ultimately, included 3 1 
studies - with a total of 1 7,797 patients - that reported on 1 6 
different multidimensional instruments on patient satisfac- 
tion after teleconsultation and triage (Figure 1; Table 1). All 
but one article was published in the English language; this 
article was published in Swedish." 

Basic information 

The instruments were developed in seven different countries: 
five instruments derived from the United Kingdom;'' '*"^' 
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Potentially relevant 
references: n=224 



Relevant references: 
n=21 
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n=203 
no triage 

no teleconsultation 
no patient satisfaction 
no instrunnent 
no health professionals 
no patient 

specific disease/diagnosis 




Relevant references: 
n=31 



Figure I Flowchart. 



four instruments from the United States;^^"^' two from 
Sweden;"-^* two from the Netherlands;^^-^' and one instru- 
ment from each of Austraha,^' France,^" and Norway.^' Also, 
seven of the 16 instruments (44%) were used by subsequent 
studies. The most frequently used instrument, 
the McKinley 1997 questionnaire,^" was applied in six 
subsequent studies'^"" and served as a basis for a shortened 
scale (Table 1).^' 

In most studies (14 of 16 instruments, 88%), the ques- 
tionnaires were distributed per email or in a paper form for 
self-administration. In three studies, both a 
self-administered and an interviewer-administered version 
were used.^"-^'-^^ The number of respondents per study varied 
between 20 and 3 ,294 persons. Also, 1 8 of the 3 1 publications 
(58%) applied instruments in the context of out-of-hours 
services, where centers triage patients from several general 
practices or a specific region.'^"^'-^'"^'-^'"^' 

Eight publications described patient satisfaction 
after the consultation provided by the teleconsultation 
centers. Other settings include: the manage- 
ment of same-day appointments;'' the provision of tele- 
consultation services by physicians outside of specialized 
telemedicine institutions;"'^'''^''"' maritime telemedicine;'" 
prison medicine;^'' and teledermatology services.^' The tim- 
ing of instrument administration varied considerably across 
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the studies. In addition, 16 publications (52%) reported 
the distribution of the questionnaires within 7 days of the 
consultation,''"-^'''^^'^"'''^*'"''"'^^ seven studies (23%) between 
14-28 days," '^'2«-2'*-3'* "-^* and one publication (3%) reported 
a latency of 4-16 months.'" Also, seven (23%) studies did 
not report on the timing of the instrument's administration 

(Table 1).25,29,35,43,45,46 

Content of the instrument 

We assessed the content of the instrument on nine prespeci- 
fied domains. On average, an instrument covered five domains 
(range, three to nine) with 14 items per instrument (range, three 
to 37), and 2.3 items per domain (range, one to 15) (Table 2). 

The most frequent domains, covered with at least one 
item, were the "perceived quality of the communication" (14 
of 16 instruments, SSVo),''-"-^''-^^-^*-'' followed by the "overall 
satisfaction" (12 of 16 instruments, 750/0) n.18.20-26.28-31 
"perceived quality of the telephone advice" (12 of 16 instru- 
ments, 75%).'''' ''"^'■^^"'' The following additional domains were 
covered by more than one-half of the instruments: the "atti- 
tude of the health professional;" the "attitude of the patient;" 
"management;" and "professional skills." This indicated a 
focus of interest across the different instrument development 
teams. Only one instrument covered all nine domains.'* 

The instruments varied widely in the number of items they 
included per domain. Seven instruments included mostly one 
or two items per domain;* "'"'^'"^'-^'' whereas, the study on the 
top end included a mean of 4. 1 items per domain.'* 

Development process 

Only ten of the 1 6 instruments (63%) provided details about the 
development process, such as a review of the literature (n=7), 
interviews of patients or stakeholders (n=5), or consultations 
with experts (^n=3).'*'^'''^'-^''~^*'''''" Seven studies reported the 
use of more than one method.'*'^"'^'-^*^^*-'' Eleven of 16 stud- 
ies (69%) performed a piloting of the instrument. '^"^^•^^"^'•^'"" 
Likert scales were predominantly used for the scoring (seven 
of 16 instruments, 44%). Other rating modes included yes/no 
options (n=2), categorical answers (n=2), numerical rating 
scales (n=3), visual analog scale (n=l), or smiley faces (n=l). 
One instrument included open-ended questions (Table 3).'° 

Recruitment strategy and handling 
of nonresponders 

Nine studies'*-^"-^''^*-^''"^*''''''^ gave information about their 
psychometric properties; therefore, their recruitment strategy 
and handling of nonresponders are fiirther evaluated here. 
Inclusion criteria were comparable, as all studies addressed 



unselected patients who had received teleconsultation and 
triage services. 

All but three publications explicitly reported the con- 
secutive recruitment of patients.^"*-^^ " The exclusion criteria 
(five of nine studies, 55%) were related to the feasibility of 
the study (for example, wrong address, serious illness of the 
patient). '^■^'•^*~^* The mean response rate was 60% and varied 
from 100%^2to38%.'"' 

The nonresponse analysis in four of nine studies (44%) 
detected sociodemographic but no clinical differences 
between the studies' responders and nonresponders. However, 
these analyses were conflicting. One study reported respon- 
dents to be older and more affluent without any differences 
in sex.'* In two studies, the response rates were lower in 
men invited to participate.^* '"' In a fourth study, women and 
young adults were less likely to participate.^' Forgetfulness 
was identified as the most frequent reason for nonresponse 
(Table 4)."28 

Psychometric properties 

For nine instruments, at least some information about the 
main psychometric properties was reported: item nonre- 
sponse; factor structure; reliability/internal consistency; and 
validity (Table 5). 

1 . Item nonresponse: six of the nine studies (67%) reported 
on nonresponses.'*'^"'^''^'*'^'''' In some studies, item nonre- 
sponse was more problematic than in others. For example, 
one study reported complete data from only 43% of the 
respondents,^' while nonresponse rates for individual 
items ranged from a few percent up to about one-fifth of 
the respondents. '*'^''" 

2. Factor structure: seven of the nine studies (78%) 
reported factor structures from a formal factor or princi- 
pal component analysis, '*'^''-^'''^''"^*-" with a multifactorial 
structure and a median of 3.3 factors (range one to six) 
related to teleconsultation per instrument. The factors 
related to: communication ("interaction," "satisfaction 
with communication and management," "information 
exchange," n=5); overall satisfaction (n=3); manage- 
ment ("delay until visit," "initial contact person," "ser- 
vice," n=3); access to service (n=2); attitude of health 
professional (n=2); telephone advice ("product," n=l); 
and individual information ("urgency of complaint;" 
n=l). The correlation between the number of items 
and the resulting number of factors was low (/-0.16). 
For instance, one high-item instrument with 37 items'* 
identified only two factors that explained 72% of the 
variance; whereas, another instrument with 20 items^" 
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reported a structure with six factors, which explained 
61% of the variance. 

3. Reliability measures: all nine instruments provided reli- 
ability measures - one study for both the total scale and 
the subscales; two studies for the total scale; and the 
remaining studies for the subscales only. The Cronbach's 
alpha values for the total scales were acceptable,*^ good,^* 
or excellent.^' Cronbach's alpha values for most of the 
factor subscales were above 0.7. However, three of the 
seven studies - evaluating the reliability of the subscales 
- revealed questionable^"-^^ and unacceptable^' Cronbach's 
alpha values for individual subscales. One study pro- 
vided results for inter-item correlation with correlation 
coefficients ranging from 0.45-0.89, indicating a good 
internal consistency of the scale.'* Three studies addition- 
ally reported item-total correlations which ranged from 
0.53-0.92, supporting the internal consistency of these 
instruments. Three publications investigated the test/ 
retest reliability and reported correlation coefficients for 
subscales of >0.7, which are considered satisfactory or 
better. For single subscales, however, correlation 
coefficients were <0.7, indicating limitations in test/ 
retest reliability.'*-^" 

4. Validity measures: in five of the eight instruments 
(63%) the scales correlated well with related constructs 
indicating construct validity. For example, higher 
scores correlated with simple measures of overall 
satisfaction.'*-^"'^'-^*'^' Other scales correlated well with 
the patients' ages, the duration of the consultation, dif- 
ficulties in contact by phone, waiting times, the amount 
of information received during the teleconsultation, the 
fulfillment of expectations or the transfer to a face-to-face 
visit. One study examined the convergent validity and 
found that sub-scores of the instrument were moderately 
correlated.'* Only one of eight studies (13%) investigated 
the concurrent validity by comparing a shortened scale 
with the original instrument and reported modest intra- 
class correlation coefficients of 0.38-0.54.^' 

Discussion 

This systematic review reports on 1 6 instruments used for 
the multidimensional assessment of 17,797 patients, regard- 
ing patient satisfaction after teleconsultation and triage for 
a health problem. The review identified four instruments 
with comprehensive information on their development and 
psychometric properties. '*'^'''^*-'' 

The selection of the most appropriate instrument will 
probably depend on the purpose of the instrument - whether 



it is thought for routine assessments after a consultation, 
for periodic application as a quality control measure, or as 
a research instrument. For example, a 37-item instrument 
demonstrated good internal consistency and an indication 
of validity. However, the proportion of missing items was 
very large for some items; the test/retest reliability may 
have been limited, and the instrument had only two factors.'* 
This instrument may be selected for research purposes or 
for routine assessments, if multidimensionality is not the 
main focus of the evaluation. Another ten-item instrument, 
in contrast, showed four factors, good internal consistency, 
and construct validity (without evaluating the test/retest 
reliability).^' Due to its brevity and test evaluation results, 
this instrument may be suitable for most purposes. The 
most frequently used instrument (20 items) demonstrated 
high-item completion rates, a six-factorial structure, and 
construct validity. However, several subscales only had a 
very limited internal consistency.^" An alternative 22-item 
instrument with a six-factor structure also showed construct 
validity, with a questionable internal consistency of one 
subscale and without information on the item completion 
rates. ^* However, both instruments may be selected if the 
multidimensionality of patient satisfaction assessment is of 
the utmost importance. 

As only seven instruments used a formal factor analysis 
to identify the relevant underlying constructs, we applied a 
pragmatic approach for attributing the content of the remain- 
ing nine instruments to a list of domains from a systematic 
review." This methodology confirmed the most frequently 
detected domains from the factor analysis ("communication," 
"overall satisfaction," and "management") and identified 
additional domains as relevant for users. These are: "per- 
ceived quality of the telephone advice;" "attitude of health 
professional;" "attitude of patient;" and "professional skills." 
Depending on their specific interests, the coverage of these 
domains may be an additional criterion for users to select 
any of those instruments. 

Although most of the instruments had been developed 
over the last decade - a decade with an increased awareness 
for the need of methodological rigor in psychometric instru- 
ment development and testing"' - many studies lacked details 
on the development process, had minimal information on the 
instruments' reliability, and only one -half of the instruments 
presented the validity of the existing scales. Factor structure, 
reliability, and validity were only reported for one-quarter 
of the instruments. No study evaluated the extent to which a 
score on the instrument predicts the associated outcome mea- 
sures (predictive validity), which would allow conclusions 
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about the patients' adherence to the recommendations or the 
heahh service use . The recruitment strategy and handling of 
nonresponders were comparable across the studies. 

In his systematic review of patient satisfaction ques- 
tionnaires for out-of-hours care in 2007, Garratt identified 
four instruments that reported some data on reliability and 
validity;^"'^'-^^'^* all were included in this review.'' Garratt 
concluded that all of those studies had limitations regarding 
their development process and their evaluation of psycho- 
metric properties. Even though several years have passed, our 
review has to confirm these limitations. Despite extensive 
searching, we did not find any attempts to further modify, 
reevaluate, and improve the instruments with limited reli- 
ability or redundant items - except in one study. That study 
reduced a 38-item questionnaire^" to a shorter version with 
only eight items.^' Six of the 16 instruments identified in 
this review were published in subsequent years. 
Of these, three instruments reported both methodological 
and psychometric data, two of which provide evidence of 
acceptable reliability and validity."-^' 

Measuring patient satisfaction after teleconsultation and 
triage is a challenging endeavor. The assessment needs to 
focus on the quality of the service without being contami- 
nated by the actions of subsequent health care providers or 
the severity and the natural course of the health problem. For 
instance, timing the administration of the questionnaire can 
be crucial. In the review, the delivery of the questionnaire 
varied between immediate inquiries to a latency of up to 
16 months postconsultation. There is conflicting evidence 
regarding to what degree the timing of administration may 
confound the measurement of patient satisfaction. Previous 
work suggests that a potential timing effect depends on the 
health status of patients and the initial problem they sought 
help for.'" Applied to our review, this would suggest that the 
optimal timing would be relatively shortly after the telecon- 
sultation (ie, < 1 week), as longer time intervals may increase 
memory problems for details of the teleconsultation, and the 
course of the medical problem may confound the perceived 
quality of the encounter. 

Our review is based on a comprehensive literature search 
that included expert contacts and no language restrictions. 
Study selection, quality assessment, and data extraction 
with pretested forms - performed independently by two 
researchers - limited bias and transcription errors. Our 
ad hoc analysis of the instruments without formal factor 
analysis confirmed the domains identified in the studies 
with a formal factor analysis, but it identified other rel- 
evant domains with face validity. Our review was limited to 



instruments published in scientific journals. However, more 
instruments are likely to be in use. A recent survey among 
medical academic centers in the USA revealed a frequent 
use of internal instruments.*^ However, if these internal 
instruments had been thoroughly developed and formally 
evaluated, we assume they would have been published in a 
scientific journal. 

If the measurement results are to be used for a com- 
parison of different teleconsultation centers or of physicians 
within these centers or to demonstrate improvements in 
patient satisfaction over time, the instruments must undergo 
rigorous development and evaluation processes. Presently, 
this is the case for only a minority of these instruments. 
For example, the Patient-Reported Outcome Measurement 
Information System (PROMIS) instruments' development 
and psychometric scientific standards provide a set of criteria 
for the development and evaluation of psychometric tests.*' 
Specifically, this includes reporting on the details of the 
development process, including the definition of the target 
concept and the conceptual model, the testing of reliability 
and validity parameters, and the reevaluations after potential 
refinements of the initial instrument. High-quality multi- 
dimensional assessment instruments should be consequently 
used in future trials to generate valid and comparable evi- 
dence of patient satisfaction with teleconsultation. This also 
includes a follow-up on patient satisfaction over time. 

Conclusion 

The status of appraisal of the instruments for measuring 
patient satisfaction after teleconsultation and triage - 
identified in the present systematic review - varies from 
comprehensive test evaluations to fragmentary and even 
missing data on factor structure, reliability, and validity. This 
review may serve as a starting point for selecting the instru- 
ment that best suits the intended purpose in terms of content 
and context. It offers pooled information and methodological 
advice to instrument developers with an interest in developing 
the long-needed assessment instrument. 
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